K8s 应用存活和容器启动结束钩子

Pod 正常里面的 Docker 服务不一定正常。Docker 服务正常，Docker 里面的服务不一定正常。所以如何正确的监测这些状态，成为了应用健康很重要的关键。 livenessProbe, 用来判定容器是否正常。readinessProbe 用来判定容器中的服务是否正常。这两种探测非常重要，一定要利用探测来证明容器正常后才能接入 Service。不然用户可能会访问失败。同时设置 readinessProbe 有助于在滚动更新时候判断容器中服务的状态，保证应用能提供健康的服务。livenessProbe，readinessProbe 和 postStart，preStop 都支持三种方式的探测，分别是 exec 执行系统命令，tcp socket 和 http get 请求。

livenessProbe

1	kubectl explain pods.spec.containers.livenessProbe

livenessProbe 支持三种存活状态的检测，分别是 tcp，exec，http get。下面演示两种

exec 存活探测

创建一个 yaml 文件，内容如下：

1	vim liveness-exec.yaml

apiVersion: v1
kind: Pod
metadata:
  name: liveness-exec-pod      
  namespace: default
spec:
  containers:
  - name: liveness-exec-container
    image: busybox:latest
    imagePullPolicy: IfNotPresent # 镜像拉取规则，此处为不存在才拉取
    command: ["/bin/sh", "-c", "touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 3600"] # 执行命令，先创建一个 healthy 文件，睡 30s 后进行删除，然后睡 3600s
    livenessProbe:  # 容器健康检查探测，用来判定容器是否正常。还有一个是 readiness 用来判定容器中的服务是否正常
      exec: # 检查方式为执行命令。另外还支持 TCP socket 探测和 HTTP GET 探测。
        command: ["test", "-e", "/tmp/healthy"]
      initialDelaySeconds: 1  # 默认为 0s，表示容器启动后多长时间开启健康监测
      periodSeconds: 3  # 默认为 10s，表示每隔多少时间进行一次探测
      failureThreshold: 3  # 默认为3次，意思是3次失败才代表失败
      successThreshold: # 默认为1次，意思是1次成功就代表成功
      timeoutSeconds: 1 # 超时时间，默认为1s

上面的 Pod 创建后，就会创建 /tmp/healthy 文件，并且睡 30s，之后被删除。健康检查的内容是容器启动1s后判断 /tmp/healthy 文件是否存在，且每隔10s进行一次探测，失败3次即认为失败。健康检查失败后就会进行重新启动。下面是 pod 的列表信息，可以看到重启的次数。

1
2
3

[root@k8s001 rexyan]# kubectl get pods 
NAME                READY   STATUS    RESTARTS   AGE
liveness-exec-pod   1/1     Running   5          6m17s

查看详细信息：

[root@k8s001 rexyan]# kubectl describe pods liveness-exec-pod
Name:               liveness-exec-pod
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               k8s002/172.20.245.189
Start Time:         Sun, 19 May 2019 16:05:01 +0800
Labels:             <none>
Annotations:        <none>
Status:             Running
IP:                 10.244.2.2
Containers:
  liveness-exec-container:
    Container ID:  docker://b6d08991993bb306f32b58f7bcc71651ac2b68d1021a05634bcae6832bbbe169
    Image:         busybox:latest
    Image ID:      docker-pullable://docker.io/busybox@sha256:4b6ad3a68d34da29bf7c8ccb5d355ba8b4babcad1f99798204e7abb43e54ee3d
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
      touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 3600
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Sun, 19 May 2019 16:10:48 +0800
      Finished:     Sun, 19 May 2019 16:11:57 +0800
    Ready:          False
    Restart Count:  5
    Liveness:       exec [test -e /tmp/healthy] delay=1s timeout=1s period=3s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-vckdx (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-vckdx:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-vckdx
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  7m16s                  default-scheduler  Successfully assigned default/liveness-exec-pod to k8s002
  Normal   Pulling    7m16s                  kubelet, k8s002    Pulling image "busybox:latest"
  Normal   Pulled     7m14s                  kubelet, k8s002    Successfully pulled image "busybox:latest"
  Normal   Killing    4m17s (x3 over 6m35s)  kubelet, k8s002    Container liveness-exec-container failed liveness probe, will be restarted
  Normal   Created    3m47s (x4 over 7m14s)  kubelet, k8s00	2    Created container liveness-exec-container
  Normal   Started    3m47s (x4 over 7m13s)  kubelet, k8s002    Started container liveness-exec-container
  Normal   Pulled     3m47s (x3 over 6m5s)   kubelet, k8s002    Container image "busybox:latest" already present on machine
  Warning  Unhealthy  2m5s (x13 over 6m41s)  kubelet, k8s002    Liveness probe failed:

在 Containers 中可以看到刚才配置的健康检查的信息

1 2	Restart Count: 5 Liveness: exec [test -e /tmp/healthy] delay=1s timeout=1s period=3s #success=1 #failure=3

http get 存活探测

apiVersion: v1
kind: Pod
metadata:
  name: liveness-http-pod
  namespace: default
spec:
  containers:
  - name: liveness-http-get-container
    image: ikubernetes/myapp:v1
    imagePullPolicy: IfNotPresent 
    ports:
    - name: http
      containerPort: 80
    livenessProbe: 
      httpGet:
        port: http
        path: /index.html
      initialDelaySeconds: 1  
      periodSeconds: 3  
      failureThreshold: 3  
      successThreshold: 1
      timeoutSeconds: 1

查看容器状态

[root@k8s001 rexyan]# kubectl get pods 
NAME                READY   STATUS             RESTARTS   AGE
liveness-exec-pod   0/1     CrashLoopBackOff   9          23m
liveness-http-pod   1/1     Running            0          104s

查看详细信息

[root@k8s001 rexyan]# kubectl describe pods liveness-http-pod
Name:               liveness-http-pod
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               k8s003/172.20.245.191
Start Time:         Sun, 19 May 2019 16:27:15 +0800
Labels:             <none>
Annotations:        <none>
Status:             Running
IP:                 10.244.1.3
Containers:
  liveness-http-get-container:
    Container ID:   docker://9cb65d175dc8263f54891b597e3a5f4a334f20c4ab636d532887cabfeb7cff3c
    Image:          ikubernetes/myapp:v1
    Image ID:       docker-pullable://docker.io/ikubernetes/myapp@sha256:9c3dc30b5219788b2b8a4b065f548b922a34479577befb54b03330999d30d513
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sun, 19 May 2019 16:27:18 +0800
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:http/index.html delay=1s timeout=1s period=3s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-vckdx (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-vckdx:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-vckdx
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  2m58s  default-scheduler  Successfully assigned default/liveness-http-pod to k8s003
  Normal  Pulling    2m58s  kubelet, k8s003    Pulling image "ikubernetes/myapp:v1"
  Normal  Pulled     2m55s  kubelet, k8s003    Successfully pulled image "ikubernetes/myapp:v1"
  Normal  Created    2m55s  kubelet, k8s003    Created container liveness-http-get-container
  Normal  Started    2m55s  kubelet, k8s003    Started container liveness-http-get-container
[root@k8s001 rexyan]#

在 Containers 中可以看到刚才配置的健康检查的信息

1 2	Restart Count: 0 Liveness: http-get http://:http/index.html delay=1s timeout=1s period=3s #success=1 #failure=3

现在手动进入容器，删除健康检查的 index.html 页面

[root@k8s001 rexyan]# kubectl get pods 
NAME                READY   STATUS             RESTARTS   AGE
liveness-exec-pod   0/1     CrashLoopBackOff   11         28m
liveness-http-pod   1/1     Running            0          6m4s

1 2	[root@k8s001 rexyan]# kubectl exec -it liveness-http-pod -- /bin/sh / # rm -f /usr/share/nginx/html/index.html

再次看 pod 的状态就会发现 pod 已经重启了一次，重启之后删除的文件就回来了，所以就不会再重启了。

[root@k8s001 rexyan]# kubectl get pods 
NAME                READY   STATUS             RESTARTS   AGE
liveness-exec-pod   0/1     CrashLoopBackOff   11         30m
liveness-http-pod   1/1     Running            1          8m12s

redinessProbe

1	kubectl explain pods.spec.containers.readinessProbe

redinessProbe 也支持三种存活状态的检测，分别是 tcp，exec，http get，下面演示一种。

http get 存活探测

apiVersion: v1
kind: Pod
metadata:
  name: readiness-http-pod
  namespace: default
spec:
  containers:
  - name: readiness-http-get-container
    image: ikubernetes/myapp:v1
    imagePullPolicy: IfNotPresent 
    ports:
    - name: http
      containerPort: 80
    readinessProbe:
      httpGet:
        port: http
        path: /index.html
      initialDelaySeconds: 1  
      periodSeconds: 3  
      failureThreshold: 3  
      successThreshold: 1
      timeoutSeconds: 1

1 2	[root@k8s001 rexyan]# kubectl create -f readiness-http-get.yaml pod/readiness-http-pod created

[root@k8s001 rexyan]# kubectl get pods 
NAME                 READY   STATUS    RESTARTS   AGE
liveness-http-pod    1/1     Running   1          26m
readiness-http-pod   1/1     Running   0          5s

之后进入容器删除 index.html

1 2	[root@k8s001 rexyan]# kubectl exec -it readiness-http-pod -- /bin/sh / # rm -f /usr/share/nginx/html/index.html

查看 pod 的信息, 可以看到 readiness-http-pod READY 个数变成了 0。READY 中 / 前面是值表示 pod 中容器就绪的数量，后面的是 pod 中容器的总个数。

[root@k8s001 rexyan]# kubectl get pods 
NAME                 READY   STATUS    RESTARTS   AGE
liveness-http-pod    1/1     Running   1          30m
readiness-http-pod   0/1     Running   0          3m43s

进入容器，重新写信息到 nginx 的 index 文件中

1 2	[root@k8s001 rexyan]# kubectl exec -it readiness-http-pod -- /bin/sh / # echo "hi k8s" >> /usr/share/nginx/html/index.html

重新查看 pod 的信息，就可以看到 pod 的 READY 状态已经从 0 变成1了

[root@k8s001 rexyan]# kubectl get pods 
NAME                 READY   STATUS    RESTARTS   AGE
liveness-http-pod    1/1     Running   1          38m
readiness-http-pod   1/1     Running   0          11m

查看详细的 pod 信息

[root@k8s001 rexyan]# kubectl describe pods readiness-http-pod 
Name:               readiness-http-pod
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               k8s002/172.20.245.189
Start Time:         Sun, 19 May 2019 16:54:04 +0800
Labels:             <none>
Annotations:        <none>
Status:             Running
IP:                 10.244.2.3
Containers:
  readiness-http-get-container:
    Container ID:   docker://2989185e07600a552f6a57ecc3e813156002e2218701da07da8b2efbfaf7c966
    Image:          ikubernetes/myapp:v1
    Image ID:       docker-pullable://docker.io/ikubernetes/myapp@sha256:9c3dc30b5219788b2b8a4b065f548b922a34479577befb54b03330999d30d513
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sun, 19 May 2019 16:54:07 +0800
    Ready:          True
    Restart Count:  0
    Readiness:      http-get http://:http/index.html delay=1s timeout=1s period=3s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-vckdx (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-vckdx:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-vckdx
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  14m                   default-scheduler  Successfully assigned default/readiness-http-pod to k8s002
  Normal   Pulling    14m                   kubelet, k8s002    Pulling image "ikubernetes/myapp:v1"
  Normal   Pulled     14m                   kubelet, k8s002    Successfully pulled image "ikubernetes/myapp:v1"
  Normal   Created    14m                   kubelet, k8s002    Created container readiness-http-get-container
  Normal   Started    14m                   kubelet, k8s002    Started container readiness-http-get-container
  Warning  Unhealthy  4m4s (x134 over 10m)  kubelet, k8s002    Readiness probe failed: HTTP probe failed with statuscode: 404
[root@k8s001 rexyan]#

在 Containers 中可以看到刚才配置的健康检查的信息

1 2	Restart Count: 0 Readiness: http-get http://:http/index.html delay=1s timeout=1s period=3s #success=1 #failure=3

容器启动和结束钩子

在容器启动后和结束前都有对应的钩子，分别是 postStart 和 preStop

postStart

1	kubectl explain pods.spec.containers.lifecycle.postStart

postStart 有三种执行方式，分别是tcp，exec 和 http get。

preStop

1	kubectl explain pods.spec.containers.lifecycle.preStop

preStop 也有三种执行方式，分别是tcp，exec 和 http get